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Abstract 

In this paper we will consider the estimation of a monotone regression (or density) function 
in a fixed point by the least squares (Grenander) estimator. We will show that this estimator 
is fully adaptive, in the sense that the attained rate is given by a functional relation using the 
underlying function /q, and not by some smoothness parameter, and that this rate is optimal 
when considering the class of all monotone functions, in the sense that there exists a sequence 
of alternative monotone functions /i, such that no other estimator can attain a better rate for 
both /o and /i . We also show that under mild conditions the estimator attains the same rate in 
L'i sense, and we give general conditions for which we can calculate a (non-standard) limiting 
distribution for the estimator. 

1 Introduction and results 

There exists an extensive literature on the problem of estimating a monotone increasing regression 
function or monotone decreasing density. We wih consider the NPMLE or Grenander estimator 



for a monotone density, see [Grenander (1956)] and the least squares estimator for a monotone 
regression function. Prakasa Rao obtained the rate and the limiting distribution for the Grenander 



estimator in a fixed point in [Prakasa Rao (1969)] and in [Brunk (1970)] a similar result was ob- 
tained for the least squares estimator. Results for global measures of convergence were obtained in 



Groeneboom, Hooghiemstra, Lopuhaa (1999)] and [Kulikov, Lopuhaa (2005)] for the density 



case, and in [Durot (2002)] for the regression case. A unified approach that incoorporates some 
other well known monotone estimators is given in [Durot (2007)] A common problem with these 



results is that they can only be proved under quite strong conditions, in which case there exist 
other non- isotonic estimators with faster rates. 



Another approach, which addresses adaptivity, can be found in [Kang, Low (2002)] , Here the 



authors define an estimation procedure for /o(0), where /o is a monotone regression function in 
the white noise model, that is rate-adaptive in a minimax sense, for any Lq [q > 1) loss-function, 
with respect to a Lipschitz parameter a. A serious drawback of this procedure compared to the 
estimator we consider, is that it does not, in general, give a monotone function as an estimate, 
when the procedure is applied to an interval of fixed points. Furthermore, the rate is described in 
terms of a Lipschitz parameter, not allowing for fast rates when the function /o has derivative in 
0, nor for slow rates when the function is not Lipschitz for any parameter value, nor for rates that 
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cannot be described by the Lipschitz parameter alone (think of logarithmic corrections). 



In this paper we will derive the rate at which the least squares estimator (or Grenander estimator) 
estimates /o(0), for a completely general monotone function /o that is continuous in 0, for four 
different models: the white noise model, measurements on a grid (not necessarily with normal 
errors), measurements on random design points and a sample from a decreasing density. This rate 
is defined in terms of the probabilistic error. This means that if we for example consider the white 
noise model 



Jo 



Y{t) 

for t G [—1, 1], and we fix some < a < 1, we get a rate a„ that satisfies 

P(|/(0)-/o(0)| >a„) <a. 
This way of determining a rate is essential if we wish to get the full generality of our results, as was 



also observed in [Cai, Low (2006)] Our rate is defined in terms of a functional relation involving 
the function /q. It turns out that the rate is similar for all four models, and to define it in the 
white noise case, we assume without loss of generality that /o(0) = and define 



Fo{t) = f h{s)ds. 
Jo 



Note that this function is convex, and due to the continuity of /o in (without which, we cannot 
estimate /o(0) consistently), we have that Fq{0) = 0. Then we fix C > and we define o, > 
and b,rh > depending on n such that 

Fo{ra) = ara, Fo{-n) = hn and rl'^a = r'Jh = Cn~^/^ (1.1) 

Note that except for the simpler case where Fq{—1) = or Fq{1) = 0, these equations always have 
a unique solution for n large enough. Define the functions ipi and tpr by 

V'.(s)=limsup§M and ^K^) = Hmsup §M (se[0,1]). (1.2) 

If Fo{t) = for some t > 0, we define ^r(s) = for s G [0, 1) and V'r(l) = 1, and likewise for ■i/'z- It 
is not hard to show that ipr and ipi are convex increasing functions, such that < 'ilJr{s),il^i{s) < s. 
We will show the following theorem: 

Theorem 1.1 With the notations as above, we have that 

limsup P(/(0) > a) < P(infTy(s)-Cs< inf W (s) - C (s - Ms)) 

n^oo \s<0 0<s<l 



limsup P(/(0) < -6) < P infTy(s)-Cs< mi W{s) - C{s - ipi{s)) ] . 

n^oo \s<0 0<s<l / 

The actual rate is therefore given by max(a, b), since the probability on the right hand side always 
goes to zero as C — > oo, and when or ipi differs from the identity function, the respective 
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probability goes to zero exponentially fast in C. This will also allow us to establish L'^ convergence 
of the estimator in the following sense. Define the increasing function 

Go(t) = Fo{t)/t. 

If Fq{1) > 0, it is possible to define Gq ^ as a strictly increasing continuous function on [0,Fo(l)]. 
Choose small enough and define 



i/o(a) 



ay Gq ^(a) if \a\ < as 

a — sgn(a)a5 + HQ{sg-a{a)as) if \a\ > as- 



The connection with the rate equations (jl.ip is given by GQ^(a) = and H(){a) = Cn for 
Q < a < as- We will show the following theorem: 

Theorem 1.2 Suppose Tpr{s) < s for some s G (0,1). Let x ■ [0, oo) [0,oo) be such that for 
some constants K > and m > 1 

X{a) < Ho{a) for a < as 
X{a) < KHoiar for a > as- 

Then there exists constants Li,L2,7, no > such that for all n > hq and G > 

We can use this to show that if there exists a, M > such that fo{x) < Mx" for positive x in a 
neighborhood of 0, then for any q > 

limsup E((/(0) - /o(0))^) < +oo. 

n^oo 

Here we use the notation x+ = max(0, x). Note that controlling the behavior of /o to the right of 
0, only controls the "overshoot" of the estimator. 

We will also determine weak regularity conditions for Fq, such that we can determine the limiting 
distribution of /(O). Suppose 

lim — = 7 G [0, oo). 

n— >oo Tf) 

This says that the rates to the left and to the right of are well behaved with respect to each 
other, which is a natural condition for a limiting distribution to exist. Furthermore, suppose Fq is 
regularly varying near 0: there exists a > 1, such that for all s > 

lim^ = .^ 

no Fo{t) 

This says that Fq scales properly near 0, which is another natural condition: we don't want different 
behavior of Fq for different scales. We will prove the following theorem: 
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Theorem 1.3 Let Ws {s G M) denote twosided standard Brownian motion, and define the process 

X( ) = { + /^^^ * ^ 0' 

I +7°"^/^|sr fors< 0, 

and the process X{s) as the greatest convex minorant of X. With the conditions given above, we 
have that 

/(0)+ d dX /(O)- d dX 

(0)+ '^^d — — r- -17T- > -J-(0)-. 



i7o"i(n-V2) ds ' _/7-i(_n-i/2) 
Finally we will show that the rate for /(O) is local asymptotic minimax: 

Theorem 1.4 Choose two significance levels a G (0, 1) and (3 G (0, 1/2). There exist rj > 0, such 
that for all n large enough, we can find a monotone function f\ ( close to fo ), and we can find a 
rate 7^ with 

limsup max P/, (|/(0) - /i(0)| > 7n) < « 

and 

liminf inf max P^, (\e{Y) - /,(0)| > r? • 7n) > /3, 

n^oo g i=0,l \ / 

where 0(Y) is any estimator o//(0) based on the data Y. 

This says that /(O) attains a certain rate 7„ for both /g and the sequence of alternatives /i (of 
course we take 7n a constant times max(a, 6)), and no other estimator can do significantly better 
for both /o and /i simultaneously. This way of describing optimality was inspired by a talk in 
Oberwolfach, given by Tony Cai and Mark Low, although their concept looked at the Lg-risk and 
it did not require that /(O) estimate fi with the same rate. 

We would like to give some feel for Equations p.ip . Suppose fo is Lipschitz continuous in with 
parameter q > 0, so for x in a neighbourhood of 0, we have (remember that /o(0) = 0) 



\foix)\ 



< 



Here, g{x) < h{x) denotes there exists a constant M > such that g{x) < Mh{x) for all relevant 
X. Then Fo{x) < so p.ip gives us 



This means that r^ ^ < a Together with the second equality for a in (jl.ip 



this leads to 

a 

a <n 2"+i , 



For b we can derive the same bound. This corresponds to the rate found in [Kang, Low (2002) 
Another interesting case is when lima_^o = fo > 0. This means that /o is flat to the right of 
on the interval [0, ro). Then 
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so this corresponds to a parametric rate. 



The rest of the paper is organized as follows: Section [2] considers the functions ipr and ipi. Section 
[3] deals with the white noise model, for which we will prove all the above results. Sections [H [5] 
and [6] each deal with one of the other three models we will consider, but we will only formulate 
and prove the corresponding theorems of Theorem II. H giving the rate, and Theorem 11.41 showing 
the optimality of the rate. The other theorems, concerning the L"? convergence and the limiting 
distribution, require some weak technical conditions, but the ideas are the same as for the white 
noise model, and are not worked out in this paper. 



2 The functions i/jj. and i/ji 

In this section we will take a closer look to the functions ipr and tpi defined in ()1.2p in the Introduc- 
tion. We will concentrate on ipr, since completely analogous statements will hold for ■0;. Since the 
function s i— > Fo{st) is convex and increasing for all t > 0, we get that ipris) is also an increasing 
and convex function on [0, 1] (this is true for the limsup of convex functions, not necessarily for the 
liminf). Furthermore, we clearly have that V'rlO) = and V'r(l) = 1- Finally, since Fq is convex, 
we know that for s E [0, 1], 

Foist) < sFoit) + (1 - s)Fo(O) = sFoit). 
This shows that for any Fq we have that ipr{s) < s. 

Lemma 2.1 For each t E [0, 1), there exists a positive continuous increasing function r] with 

Fo{t) = ^ 7]{t) = (VtG[0,l]), 
such that for all t € (0, 1] and for all s £ [0,t] 

Foist) < iMs) + riit))Foit). 
Proof: Suppose Foit) > for all t > 0. Define the auxiliary functions 

Fojsu) 

Gtis) = sup . 
u<t Fo{u) 

These functions are all convex and they decrease pointwise to -i/'r on [0, 1]. Since Gt(0) = for all 
t > 0, we conclude that Gt converges uniformly to ipr on [0, r]. Define 

r/(0) = and r/(t) = sup - Gt(s)| (t G (0, 1]) 

se[0,r] 

and note that 

^l^<Gtis)<Ms)+v{t), 

to conclude the statement of the lemma (note that -i/'r is continuous on [0,r], so rj is indeed a 
continuous increasing function). Now suppose that Foit) = for some t > 0. We defined ipris) = 
for s G [0, 1) in this case. Define 

ro=sup{tG [0,1] :Fo(t)=0}. 
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If ro = 1, then the statement of the lemma holds with 77 = 0. Suppose tq < 1. Since s < r, we 
have for each t < tq/t, FQ{st) = 0. Define r]{t) = for t e [0,ro], r]{t) = 1 for t G [ro/r, 1], and 
continuous in between, and the statement of the lemma holds trivially. □ 

Let {Ws : s G M} be a two-sided Brownian motion. We will encounter the probabilities in the 
next lemma throughout the rest of the paper. 

Lemma 2.2 For any Fq we have that 

1 



inf Ws-Cs< inf Ws - C{s - A{s)) < ,^ ^ 
K^s<o o<s<i J v27rC7 

// there exists s G (0, 1) such that ipr{s) < s, then there exist r G (0, 1) and p G (0, 1] such that 



( inf Ws-Cs< inf Ws - C{s - Ms))] < \ — tt^ T g-^'^'''/^ 



Proof: First note that the left-hand side and the right-hand side of two-sided Brownian motion 
are independent. It is therefore enough to consider the two sides within the probability seperately. 
It is well known that 

P(inf Ws-Cs< -v) = ¥{supWs -Cs>v)= e~^^". 

This follows from the hitting time of a linear boundary. Since for all -Fq we have that ipr{s) < s, 
we also need that 

P( inf Ws < -w) = P( sup Ws>w) = 2(1 - 

0<s<l 0<s<l 

where $ is the distribution function of the standard normal distribution. We get 

inf H^,-Cs< inf Ws-C(s-Ms))] < PfinfH^, -Cs< inf Ws 

s<Q * ~0<s<l * ^ Yr\ JJ J - y^^^ s -0<s<l * 

27r Jo 



= 2e2^'(l-$(2C)) 
1 

< 



27rC 

Now suppose that for some s G (0, 1), V'r('S) < s. Since V'r is convex, ipr{0) = and V'r(l) = 1; this 
implies that for any r G (0,1) and any s G (0,r], ■0r(s) < sij)r{T)/T < s. Choose r G (0,1) and 
define p = 1 - Vr-(T)/T > 0. Then 

Vs G [0, r] : s — V'r-(s) > ps. 

Now use that 

miWs-Cs< inf Ws-C(s-Ms))] < P(infVF, -Cs< inf Ws-Cps 

s<0 ~ 0<s<l ^ -r \ n J - y^^^g - q<^<^ 

< pfinfVF, -Cs< W^-Cpr 
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This last probability we can calculate exactly: 

miWs-Cs<Wr-CpT] = l-^CVTp) + ^= e-^^^'e- " dv (2.1) 
s<o y V27rr Jo 

= 1 - $(CV^p) + (1 - $(CV^(2 - p))) e^'^(2-'')'/2 e-CVpV2 

- V TTT Cp(2 - p) 

We will now consider the case where Fo(st)/Fo(t) actually has a limit. This is comparable to saying 
that Fq is a regularly varying function in 0, but we have the extra information that Fq is convex. 

Lemma 2.3 Suppose for each s G (0, 1] we have Fo{s) > and 

lim } = ibr(s)- 

Then either ipr{s) = on [0, 1), or ipr{s) = s°' , for some a > 1. In the latter case, we have that for 
each T > ( also for t > 1) 

( Foist) _ \ no 



sup - ^ 0. 



se[o,T 

Proof: Suppose < u < s < 1. Then 



Since ipr is contimious and convex on [0, 1) and ^r(O) = 0) wc conclude that either ipr{s) = on 
[0, 1) or ipr{s) = with a > 1. In this last case, choose s > 1. Then 

Um ^ = flim ^\ " = Aim " = s^ 

no Fo(t) V*iO-Po(st)y V*io ^oW J 

The family of convex functions {s ^ F^^st) / FQ{t)} converges pointwise to the convex function 
s s", and all functions are in 0, so the convergence is actually uniform on compact subsets of 
[0,oo). □ 



3 The monotone LS-estimator in white noise 

We will work in the white noise model, so our data Y{t) satisfies 

dY{t) = fo{t)dt + sdW{t), 

where /o is a monotone L^-function on [—1, 1] and W{t) is standard two-sided Brownian motion. As 
usual, the parameter e should be compared to n~^/^. We wish to study the least squares estimator, 
but in fact we will define for a realization of W{t), 

Y{t) = f fo{t)dt + eW{t), 
Jo 
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and the convex function 

F{t) = sup{(/>(t) : (p affine and Vs G [-1, 1] : (pis) < Y{s)}. 

So F is the greatest convex minorant of Y. Now we define the estimator / as the left-derivative of 
the convex function F, so for t € (—1,1) 

^ ' HO h 

This is a monotone function and can be seen as a hmit of least squares estimators over the class of 
monotone functions absolutely bounded by M, as M ^ oo. 

We will assume without loss of generality that /o(0) = 0. Furthermore, to ensure that our 
estimator /(O) is consistent as e — > 0, we assume that /o is continuous in 0. We are interested in 
the probability of the event {/(O) > a}, for a > 0. Define 

Fo{t)= f h{s)ds. 
Jo 

Fix C > not depending on e, and choose a,b > and ra, ri, > such that 

Fo{ra) = ara, Fo{-rb) = bri, and r^^a = r^^ft = Ce. (3.1) 

Since Fq is convex and continuous, and /o is continuous in 0, this can always be done if Fq{1) > 
and Fo(— 1) > 0, simply by choosing e small enough. We will consider the special (and simpler) 
case fo{t) = for all t > (or for all t < 0) separately. 

Theorem 3.1 With the notations as above, we have that 



limsupP(/(0)>a)<P infW, -Cs< inf - C(s - Vr(s)) 



£j,0 V'^^O 0<s<l 

and 

limsupP(/(0) < -6) <p(inf -Cs < inf - C(s - VK^)) ) • 

elO V^'^O 0<s<l J 

Since both probabilities tend to zero when C ^ oo, it follows that Equations Ii3.1\) determine an 
upper bound for the rate of convergence o//(0). 

Proof: We will only show the result for a; the proof for b is completely similar. Note that we have 
the following "switch relation" for the greatest convex minorant: 

{/(0)>a} = { inf (eWt + Foit) - at) < inf (eWt + Foit) - at)} . (3.2) 

-l<t<0 0<t<l 

We can rewrite (13.21) as follows: 



{/(0)>a} = { inf (r-y^Wr^^ + e-^r-'/^Fo{ras)-e-'rl/'as)< 

-r-i<s<0 

inf {r~'/^Wr^s + e-'r-^/^Fo{ras) - e~'ri/^as)}. (3.3) 

0<s<r-^ 
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Define 



W — r~^/'^W 



Clearly, Wg is also a two-sided Brownian motion. Now we can use Lemma l2.ll for any r G (0, 1) 
there exists a positive continuous function r] with i*b(i) = ^ r]{t) = 0, such that 

inf {r-^^^Wr,s + e-^r-^/^Foiras) - e-^rl/^as) < inf Ws - C(s - Ms)) + Ci]{ra). 

0<S<Ta^ 0<s<r 

Now remark that for s < 0, Fq{s) > 0, so that 

inf (r-'/^Wr^s + e-V-i/2Fo(r,s) - Cs) > mi{Ws - Cs). 

-r~'^<s<0 «<0 

In view of (13.31). we have shown that 



IP(/(0)>a) < F[mi{Ws-Cs) < inf Ws - C{s - i^r{s)) + Cri{ra) ] . (3.4) 

\ s<0 0<s<r J 

Define vq = lima|o '''a- Since we always have that Fq^vq) = 0, we conclude that limaj^o vi'''a) = 0, so 
limsupP(/(0) > a) < P( inf(W', -Cs) < inf W", - C(s - Vr(s)) ) • 



elO 



s<0 0<S<T 



Since this is true for all r G (0, 1), and since V'r is increasing on [0, 1], we conclude that 
limsupP(/(0) > a) < P ( inf(M^, - Cs) < inf Ws - C{s - ijris)) 



<0 0<s<l 



When fo{t) = for all t > 0, we choose a = Ce, and (j3.2p implies that 

P f/(0) >a) < P f inf (Wt - Ct) < inf {Wt - Ct)] . 
\ J V"i^*^o o<t<i ) 

This shows that in this case, the upper confident limit for /(O) is of order e (parametric rate). This 
also happens when ro > 0, which is the case when /q is flat to the right of 0. □ 

3.1 L'^ convergence of the LS-estimator 

The basis for deriving the {q > 0) convergence of the Least Squares estimator will be Equation 
(j3.4|) . together with Lemma 12.21 and a uniform integrability argument. We note that (|3.4p holds 
for all choices of C > 0, as long as Equations (j3.ip for a and have a solution. To ensure this, we 
choose 6 G (0, 1) small and define as > and Cs > such that 

Fo{^) = 0-5^ and as6^^'^ = eCs- 

This is possible as soon as Fq{\) > 0; the case Fq{1) = is in fact easier. So for any C < C^, we 
have Ta < and since r/ is increasing, we get 

P(/(0)>a) < P(inf(PF, -Cs) < inf Ws - C{s - M^)) + Cr]{5)] . 

Vs<0 0<S<T / 
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Note that from the derivation of Equation (|3.4|) . it follows that we can choose r G (0,1) fixed, 
independent of C and e. Now we wish to make the dependence of a on C more specific. To this 
end, we introduce two auxiliary functions Go and i^o^ 



Go{t) = Fo{t) /t and HQ{a) = a^J G^^ {a) for a < a^. 

Since Fq is convex and has derivative in 0, we get that Gq is strictly increasing on the set 
{t G [0,1] : FQ(t) > 0}. This means that Gq^ can be defined on [0,0^] in a continuous, strictly 
increasing manner. Therefore, Hq will be a continuous, strictly increasing function on [0, ag] with 
Ho{0) = 0. Clearly, 

i^a = GQ^{a) and HQ{a) = Ge. 
We extend the definition of Hq to [0, oo) by 

HQ{a) = HQ{as) + a — as for a > as- 

In this way, Hq remains a continuous and strictly increasing function. We define for all C > and 
e > 0, Ho^a) = Ge. We can show the following proposition, using the notation X4. = max(0,x). 

Proposition 3.2 Suppose V'rl-s) < s for some s £ (0, 1) (and hence for all s £ (0, 1)). With the 
notations as above, we have for all e small enough, that for all G > 

F(e-^i?o(/(0)+) > C) < P (^inf (Ws - Cs) <Wr- ^SG 

Proof: Since Hq is strictly increasing, we have 

P(/(0) >a)= P(Fo(/(0)+) > Fo(a)) = P(e-iFo(/(0)+) > G). 
So for any G < Gs = agd^^'^e^^, we get 

F{e-^ Ho{f {0)+) >G)<¥( inf {Ws-Gs)< inf Ws - G{s - Ms)) + Cri{5)] . 

\S<0 0<S<T J 

What can we say when G > Gg? With a defined by HQ{a) = Ge, we get that a > as and 

F(e-iFo(/(0))+ >G)= P(/(0) > a), 
so we can use Equation (|3.2p to conclude that 



{e-'Ho{f{0)+)>G) = F{_igJeWt + FQ{t)-at)<^mfJeWt + FQ{t)-at) 
< P ( inf (eWt - at) < inf (eWt + agt - at) 

-l<i<0 0<t<5 



< 



( inf (Wt - {ase-' + G - Gs)t) < mi (Wt - {G - Gs)t)] . 

\-l<t<0 0<t<5 J 

Now note that ase~^ = S~^^'^Gs > Gs, which leads us to 

F(e-iFo(/(0)+) > C) < P f inf {Wt - Gt) < mi {Wt - {G - Gs)t)] . 



'l<t<0 0<t<5 
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Now we use that ipris) < s, also in view of Lemma I2.2[ Choose S so small, that 6 < t and 
r - V'r(-r) - r]{5) > 6/2. Then for C7 < we get 



F{e-'HoifiO)+) >C) < F{M{Ws-Cs) < inf Ws - C{s - il^ris)) + Ci]{6) 

\S<0 0<S<T 

< P ( inf (W, - Cs) <Wr- Idc] . 
\s<o 2 J 

For C > C5 we have 

P(e-iFo(/(0)+) > C) < P (^^liWt - Ct) <Ws-{C- Cs)d^ . 

Since for C big enough, we have 

inf (W, - Cs) <Ws- l6c) < P ( inf (W, - Cs) <Wr- ^dC 
s<o 2 J \s<o 2 

we conclude that for C > 2Cs, 

P(e-i/7o(/(0)+) >C)<F (iniiW, - Cs) <W^- Uc\ . 

\s<X> I J 

Note that P(e~"'^i?o(/(0)+) > C) is a decreasing function of C, so for e small enough, which means 
C5 big enough, we can conclude for all C > that 

P(e-^i/o(/(0)+) > C) < P ( inf (W, - Cs) <Wr- -.SC] . 

\s<o 4 J 

□ 

The definition of Hq^o) depends on the choice of (5 > 0, when a > as, but since /(O) — > 0, this 
should not be relevant. To prove this, we show the following corollary. 

Corollary 3.3 Suppose il)r{s) < s for some s £ (0,1). Let x '■ [0,oo) [0, 00) be such that for 
some constants K > and n > 1 

X{a) < Ho{a) for a < as 
X{a) < KHoia)'' for a > as- 

Then there exists constants Li,L2,7,eo > such that for all < e < eq and C > 

P(e-V(/(0)+) > C) < Lie-^^^\ 

Proof: Note that for e < 1, 

P(e- V(/(0)+) > C) = F{e-'x{f{0)+) > C A /(O) < as) + P(e- V(/(0)+) > C A /(O) > as) 

< F{e-'Ho{f{0)+) >C)+ F{Ke-'Ho{f{0)+)^ > C) 

< P(e-i/7o(/(0)+) >C)+ F{e-'/^Ho{f{0)+) > R-'/'^C'/-) 

< F{e-'Ho{f{0)+) >C)+ F{e-^Ho{f{0)+) > R-'/^C^/^) 

< L,e-^'^\ 
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for some choice of Li,L2,7, eg > aiid all e < Eq. In the last step we used Proposition 13.21 and 
Equation ([231). □ 



To see how we can use Corollary 13.31 let us assume that Tpr{s) < s and that for positive x in 
some neighbourhood of 0, we have 

Mx) < x°, 

for some a > 0. Then Fq{x) < x"^^, so Go{x) < x". Therefore, G-^{a) > a^/", and 

, ^ 2a + l 

Ho{a) > Ra—, 

for some i? > 0, at least for < a < if we choose 5 > Q small enough. Now define for all a > 

2q+1 

X(a) = i?a 2a . 
Corollary 13.31 then shows that for all C > 

P(e-V(/(0)+) >C)=V (e^5lTT/(o) > {C/R)^) < Lie^^^^", 
and this proves that for any q > 0, 

limE ((e~^/(0)+y) < +oo. 

Of course we can get a similar result for /(0)_ and for |/(0)|. The condition ipris) < s does not 
affect the rate of the estimator. It is only necessary to get control over the tail of the rescaled LS 
estimator. 

3.2 Limiting distribution of the Least Squares estimator 

Our methods also allow us to derive non-standard limiting distributions for the Least Squares 
estimator. These limiting distributions only exist when /o is somehow "regular" near 0. The 
precise conditions are described in the following theorem and will use Lemma 12.31 We start with 
the rate equations: for e > and C > we define a, r^, 6 and rf, by 

Fo{ra) = ara, Fo{-rb) = brb and r^^a = r^^fe = Ce. 
Theorem 3.4 Suppose that 

hm — =7, 

with 7 € [0, oo). Furthermore, suppose that for s > 0, 

FQ{st) „ 
lim -— = s 

no Fo{t) 

for a > 1 (see also Lemma \2. Then, ifWs (s G M) denotes twosided standard Brownian motion, 
limP(/(0) >a)=F (mi{Ws + C-/"-^^^\s\'' - Cs) < inf(W, + - Cs)] 

e— »0 \s<0 s>0 J 

If\ims^ora/rh = +oo, then 

lim P(/(0) > 0) = 0. 

E— >0 
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1/2 1 /2 

Proof: We start with assuming that 7 > 0. Since ra/ri, 7 and arj = hr^ , we see that 
a/b 7~i/2 and -Fo('"a)/-^o(— ^b) 1^^"^ (since -Fo('^a) = and Fo(— rf,) = brb). For each r/ > 0, 
we have that (7 — ri)ri, < < (7 + r/)rfc, for e smaU enough. Therefore 

hmsup r < lim — = 

We used that, since ipris) > for s e (0, 1], we have that ^ and rb 0. The inequahty holds 
for all 1] > 0, and we can show a similar inequality for the liminf, which means that 

limiM!H = l. 

£^0 Fo(7rb) 

Since ra and rh are decreasing continuous functions of e, we have shown that in fact 

7^ 



HO Fo{-t) 
This in turn implies that 

lim^=lim^M±4 = ." 
no Fo{-t) no Fo(+7t) 

So the rescaled behavior of Fq to the left of zero is equal to the behavior of Fq to the right of zero. 
The rest of the proof is based on Equation (j3.3p : 

P(/(0)>a) = pf inf (Ws + e-'r-'/^Fo{ras)-Cs)< 

\-ra^<s<0 

inf {W, + e-\-^I^Fo{ras) - Cs)] . 
Here, Ws is twosided Brownian motion. Note that we can rewrite this equation as 
F(/(0) > a) = P ( argmin {Ws + e-^r-^/'^FQ{ras) - Cs) < ) . 

\se[-r-l,r-l] J 

Using Lemma |2.3| we conclude that there exists a family of functions ?/t(s) on [0, cxd), such that 
ijt ^ uniformly on compacta as t — > 0, with 

Foist) = s'^Foit) + r]t{s)Fo{t) G M). 

This shows that for s € [0, 00), we have 

Ws + e-^r-^/^Fo{ras)-Cs = Ws + Cs'' + Cr]t{s) - Cs 

— > Ws + Cs'^-Cs, 

uniformly on compacta. For s E (—cxd, 0], we have to be a bit more careful: 



■-i/2Fo(r,s) = e-V-i/2(^^y|s|«i7o(_^,) + e-V-i/2^^(|,|^J^,)Fo(-r,) 
\ a-1/2 /r \ "^^'^ 



C77"-l/2|5|a^ 
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uniformly on compacta. We have shown that uniformly on compacta 

-1-1/2^/ N ^ ( Ws + Cs'^-Cs fors>0, 
Ws + s r„ ' Fo{raS) — Cs^< ^ ^ i/9i m- ^ r 



Now we wish to use Theorem 2.7 from [Kim, Pollard (1990)] , p. 198. This Theorem implies that 



the location of the minimum of the process Ws + £~ Va Fo{ras) — Cs converges in distribution to 
the location of the minimum of its limiting process, provided that this location is Op{l). To show 
this last condition, we consider for M > 1 

P I argmin {Ws + e"V^^/2^o(?-as) -Cs)> M] < P ( inf {Ws + £'^r-^^^Fo{ras) - Cs) < ) . 

\s€[-r-\r-'] J V-^A^ / 

Now we use that for e small enough, Fo(Mra) > M°^Fo{ra) — Fo(ra), so for s > M, using convexity 
of Fq, we get 

Fo{sra) > sFo{Mra)/M > W^^araS - aras/M. 

Using this we get 

ini{Ws + e-^r-^/^FQ{ras)-Cs) < ¥[ inf {Ws + CM^-^s - Cs/M - Cs) < o] . 

5>M J \s>M J 

Clearly, this last probability goes to zero exponentially fast as M — > +oo, since a > 1. Now we 
have to check the lower bound for the location of the minimum: 

argmin {Ws + e'^r-^^^Fo{ras) - Cs) < -M] < P ( inf {Ws + e"V^^/Vo(ras) - Cs) < ) 



\sG[-ra ,r. 

< P f inf {Ws - Cs) < ) . 

\s<-M ) 

This last probability again goes to zero exponentially fast as M ^ +oo. This proves the Theorem 
for 7 > 0. When 7 = 0, so ra/r;, — > 0, the above reasoning goes through, except for the convergence 
of the process Ws + e^^Va ^^'^FQ{ras) — Cs for s S (—00, 0]. We need to show that 

e-V-i/2Fo(r,s) ^ 0, 

uniformly on compact subsets of (— oo,0]. Fix a compact set [— M, 0] and choose £ so small, that 
Mr a < n. Then for all s G [-M,0], 

\e-'r-'/'Fo{ras)\ < e-'r-y^Fo{-n)\s\^-^ 

n 

0. 

Finally we need to prove the last statement. For this, we directly use Equation (|3.2p : 
P(/(0) > 0) = P (miJeWt + Fo{t)) < ^^niJeWt + Fo(t))] . 
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— 1/2 

Now we take the usual rescaling, replacing t by r^s and multiplying with ra ■ 



P(/(0) > 0) = P 



( inf {Ws + e-'r-'/^Fo{ras))< inf {Ws + e-'r-'^^Fo{ras))) . 



Choose £ small such that > r^. Then if s < —rh/va, we have FQ{ras) > |s|Fo(— rfc)ra/r{,, whereas 
if —Tb/ra < s < 0, we still have that Fo(ras) > 0, so 



F(/(0) > 0) < P f inf iWs + f — V C'NI) < inf Ws + Cs 

\-ra^<s<~n/ra \nj 0<s<l 



+ P inf Ws< inf Ws + Cs\. 

\-rb/ra<s<0 0<s<l / 

Since ra/rf, — > +cxd, these two probabilities clearly go to zero, since info<s<i Ws + Cs < with 
probability 1. Note that for this last result, we do not need any other assumptions on Fq. □ 

As before, we introduce the auxiliary function Go and Hq, but now on a full neighborhood of 0: fix 
6 > and for t £ {-5, 5) 



Go{t) = Fo{t)/t and Ho{t) = t^lG^^t)]. 

As before, we have that both Go and Hq are strictly increasing functions on {—d, 5). We also know 
that the rate equations ()3.ip imply that 

Ho{a) = Cs and Ho{-b) = -Ce. 



Corollary 3.5 Suppose 

iim — =7, 

with 7 S [0, oo). Furthermore, suppose that for s > 0, 

lim^ = ." 

for a > 1 (see also Lemma lK^) . IfWg {s G M) denotes twosided standard Brownian motion, define 
the process 

Ws + s"" for s > 0, 

Ws + 7°"^/^|sr fors<0, 



Xis) 



and the process X{s) as the greatest convex minorant of X. Then 



e-iFo(/(0))^sgn[^(0) 



Here, sgn(x) denotes the sign of x £ 
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Proof: We start by considering F{e-^Ho{f{0)) > C), for C > 0. We get 
P(e-ii/o(/(0)) > C) = P(/(0) > a) 



inf(W, + C7 



q:-1/2| la 



s<0 



Cs) < mi{Ws + C\s\'^ -Cs) , 



s>0 



according to Theorem 13. 4[ Now replace s by C^/^^ ^"^s, multiply left and right by C ^^^^ and 
use Brownian scaling to get 



e-'Hoifm > C) 



s<0 s>0 



Using the switch relation for the greatest convex minor ant, we see that 
Fie-'HoifiO))>C) 



dX 2a-2 

as 



When 7 = 0, the proof is finished, since in that case 

Now suppose 7 > 0. We have seen in the proof of Theorem 
is the same as the scaling to the right, so for all s > 



> c 



that the scaling of Fq to the left of 



Consider for C > 

Fie-'HoUm < -C) 



(/(O) < -b) 

mf{Ws + C7^"+^/2|sr - Cs) < mi{Ws + C|sr - Cs)] , 

s<0 s>0 J 



using Theorem 13.41 for the left hand side of the origin (that is, interchange a and b and replace 
7 by 1/7). Now replace s by —jC'^^^^~'^"^s, multiply left and right by 7^1/2(7^1/(1-20) 
Brownian scaling to get 

F{e^^Ho{fiO)) < -C) — > P ( inf (H^, + + C^s) < inf (W, + 7°-i/2|s|° + C^s)] . 

\s>0 s<0 J 

Note that the two infima have switched sides because of the scaling with a negative constant. Again 
using the switch relation we get 



P(e-^Fo(/(0)) < -C) 



dX 2a-2 

-(0) < 



dX , , 



< -c 
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This proves the corollary. 



□ 



The condition that Fq is regularly varying around with parameter a > 1, implies that the function 
Hq is regularly varying around with parameter /3 = (2a — l)/(2a — 2), so for all s > 

lim^ = ./^. 

It is well known from the theory of regularly varying functions that this limit is uniform for s G 
[1/M, M], for any M > 1. This will help us prove the next corollary: 

Corollary 3.6 With the conditions and notations from CoroUarv \3.5l we can show that 



/(0)+ d dX 

' ——(0)+ and 



H,\e) ds 
Proof: We wish to show that 



/(O)- d dX 
-H^\-e) ds 



H,' (sgn(/(0))e 



/(O) 



sgn(/(0)) ^1 in probability. 



(3.5) 



Suppose rj > 0. Using Corollary 13.51 there exists M > 1 such that for all e small enough 

[HoifiO)) G l-eM, -e/M] U [e/M,eM]) > 1 - r?. 

If i?o(/(0)) G [e/M,eM], we know that 

H,\e/M) ^ /(O) ^ H,\eM) 



Since Hq is regularly varying around with parameter 1//?, we then know that for e small enough, 
A similar reasoning shows that if i7o(/(0)) G [—e/M, —eAd], then for e small enough, 



Now consider 

e-^Hoifm = sgn(/(0))i7o ( i^o"Hsgn(/(0))e) 



< 2M^/^. 



//o~'(sgn(/(0))e) 



\/Ho{HQ\sgn{fme)). 



Since H(){st)/Ho{t) s^ uniform for s in compact subsets of (0,oo), we can conclude with proba- 
bility higher than 1 — r], that for e small enough, 



6-ii7o(/(0))-sgn(/(0)) 



/(O) 



HQ\sgnif{0))e)^ 



< rj/M and e-^-ffo(/(0)) > 1/M. 
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This proves (|3.5|) . Corollary 13.51 then immediately shows that 



sgn(/(0))/(0) d dX_^^^^ 



H-\sgn{f{0))e) ds 
This can be written in a nicer way when we look at /(0)+ and /(0)_: 

/(0)+ d dX_ 



Ho\e) ds 

and 



(0)- 



/(0)_ '^^^(o) 



□ 



(t ^ 0). 



-H,\-e) ds 

Suppose /o is differ entiable in with /o(0) > 0. Then 

Foist) ^sH^m+o{t^) 

m) itv^(o)+o(t2) 

Furthermore, Go{t) = Fo{t)/t = ^/o(0)i + o{t), which implies that G^^{t) = 2/^(0)~H + o{t), so 

Ho{t) = V2f;,{Or'/H'/^ + oit'/^). 

This means that 

Define X{s) = Ws + s^, with Ws twosided Brownian motion, and define X as the greatest convex 
minorant of X. Then Corollarv 13.61 tells us that 

-1/3 



in accordance with the classical result by Brunk in [Brunk (1970)] , when translated to the white 
noise model, except that we do not need a continuous derivative of /o in a neighbourhood of 0, we 
just need the existence of the derivative in 0. 

3.3 Optimality of the rate 

We wish to show that the rate for the LS-estimator is "locally optimal" in the following (non- 
precise) sense: for each monotone L^-function /o, there exists a sequence of alternative monotone 
L^-functions /i, such that the rate of the LS-estimator for /o and /i cannot both be significantly 
improved by any other estimator. To be more precise, we will prove the following theorem: 
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Theorem 3.7 Choose two significance levels a € (0, 1) and (3 S (0, 1/2). There exist r] > 0, such 
that for all e > small enough, we can find a monotone L"^ -function fi (close to fo), and we can 
find a rate 7(e) with 

limsup max Pj^ (|/(0) - fi{0)\ > 7(e)) < a 

and 

liminf inf max P/^ (\0{Y) - fi{0)\ > r] ■ 7(e)) > f3, 
where 0{Y) is any estimator o//(0) based on the data Y. 

Remark 1: One may want to choose different rate- functions 70 and 71 for the two different 
functions fo and /i, but it seemed natural to take them equal. In any case, this statement is 
stronger. 

Remark 2: Choose an event A C C([-l, 1]) such that Ff^{Y € A)>l/2 and P/^(y ^ A) > 1/2. 
Define the estimator 

^(y) = /o(o)u(r) + /i(o)Uc(y). 

Then for any choice of r] and 7, we would have 

max Ff^(\9{Y)-fm>r]-l{e))<l, 
which is why in Theorem 13.71 we choose P G (0, 1/2). 

Proof of Theorem 13. 7t Choose e > small enough such that the equations 

Fo{ra) = ara, Fo{-rb) = hr^ and rl^'^a = rl^'^b = Ce 
have solutions for some fixed C > with 

2 

< a. 



2ttC 

Suppose that for this e, a > b. The case a < b can be handled analogously. Define for some fixed 

, , _ J 5a if t > and fo{t) < 6a, 
I fo{t) otherwise. 

Then /i is a monotone L^-function. Note that /i will be discontinuous in 0. Define 

7(e) = 2a. 

Then Theorem 13.11 together with Lemma 12.21 shows that (remember that a >b) 

P/o(l/(0)| >7(e)) <«• 

Since /i > fo, it easily follows that 

¥f, (/(O) -5a< -2a) < P/„ (/(O) < -a) < ^ 



27rC 
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Now we focus on Ff^ (^/(O) > (2 + 6)aj . Define 

Flit) = f h{s)ds. 
Jo 

We can use Equation (j3.3p for the situation where the underlying function is /i: 
{/(0)>2a} = { inf {r~'/''Wr^s + e-V/^Fi{ras)-2e~'rl/^as)< 

-ra^<s<0 

inf (r-'/^Wr^s + e-'r-'/^Fi{ras)-2e-\l/^as)}. 

0<s<r-^ 

Again we have that Fi{s) > for s < 0. Define Sa = in{{t > : foit) > a}. Clearly, 
ssa < ^a- We easily check that for s > ssa, Fi{s) = Sassa + Fois) — ^o(s<5a)- This implies that 
Fi{ra) < (1 + S)ara < 2ara. Since Fi is convex, we conclude that for < s < 1, 

Fi{ras) < 2araS. 

Now we can follow the exact same steps as in the proof of Theorem 13. H starting at Equation (j3.3p . 
to conclude that 

(/(O) >2a)< ^ 



2V27rC 
This clearly shows that 

P/i (l/(0) - /i(0)| > lie)) = P/, (/(O) > (2 + 5)a) + P/, (/(O) < -(2 - 5)a) < a. 

So we have shown that our rate 7 satisfies the first requirement of the theorem. 

Now define ^ as the probability measure on C([— 1,1]) that corresponds to standard two-sided 
Brownian motion, and denote with Pq and Pi the measures corresponding to the model with /o 
and /i respectively. It is well known that 



Therefore 



{W) = exp (e-i j Mt)dW{t) - J ^{tfdt 



^(VF) = exp [e'^ j (hit) - Mt))dWit) - J fi{tfdt + ]^e~^ j foitfd?j 



This means that 



(^exp (^£-1 1 (2/1 (t) - /o(t))<iTy(t) - y + ^e-' J hitfdt 

exp y (2/1 (t) - /o(t))'dt - j fi{tfdt + ^e'^ j Utfdt^ - 1 

e^Y>{e-^ j{h{t)-mfdt^-l. 
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We immediately see that 

J (flit) - foit)fdt < S^ahsa < S^a'^ra 

so we conclude that 

\\Pi-Po\\ < Vexp(C2<52) - 1. 

Choose 6 G (0, 1] small enough, such that ||Pi — -Poll < 2 — 4/3. Choose rj = 5/4. Denote with pi the 
density of Pi with respect to /x (z = 0, 1). We have that for any estimator 9 

-, 1 

ma^P/,(|^(r)-/i(0)|>2r?a) > - J] P^, - /,(0)| > 2r?a) 

= ^ (l{|0(y)|>2„a}f^o(W^) + l{|e(y)-^a|>2r,a}^'l(^)) 



> - E^(min(poW,l^iW)) 



2 
1 

2 

> 13- 

This proves the theorem. □ 

4 The LS-estimator with measurements on a grid 

In this section we wish to show that in the model 

Yi = fo{xi) + Si, 

where Xi = i/n [i = —n, . . . ,n) (so our measurements are taken on a grid) and iid, we will get 
results analogous to the white noise model. The key observation is that when we take measurements 
on a grid, we can represent the least squares estimator / as the derivative of a greatest convex 
minor ant, just as we did with the white noise model. When we define 

n 

9(t) = ^ >il{xi_i<t<j;i}' 

i=—n 

with x-n-i = —1 — and for s G [—1 — 1/ra, 1] 

G{s)= r g{t)dt^ 
Jo 

we can define 

F(t) = supWt) I (p affine and V - 1 - - < s < 1 : (p{s) < G{s)}. 

n 

Finally, the least squares estimator is defined as 

F{t) - F{t - h) 



fit) = hm 



HO h 
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Define a and b, depending on n, as follows: 

Fo{ra) = ara, Fo{-r,) = bn and rl^^a = rl^^b = Cn'^/\ (4.1) 
Here, as before, C > is some fixed constant. We have the following result: 
Theorem 4.1 With the notations as above, suppose Var(ei) = < +oo. Then 

( C C 
limsupP(/(0)>a)<P infVFs s< inf Ws {s - i)r{s)) 

n^oo \s<0 a 0<s<l a 

and 

limsupP(/(0) < -6) < F (inf Ws--s< inf - -{s - i)i{s)) 

n—^oo \s<0 a 0<s<l (J 

Proof: We start by bounding P(/(0) > a); the bound for P(/(0) < —b) follows completely analo- 
gously. As in (|3.2p . we note that 



{/(O) > „) = (Git) - at) < ^mf ^ [a(t) - at)}. 

Define 

n 

i=—n 

We now use a similar rescaling as with the white noise model, so t = raS and multiplying left and 
right with n^/'^ra Then /(O) > a precisely when 



ra'^'n'/^ E^^l{-»-i<*<-J^* + ^-"'^'™'^' / fo{t)dt-{ran)'/'as] < 
Jo ~^ Jo J 

^a'/V/2 / " V + / " fo{t)dt-{ran)'/^as]. (4.2) 

Jo ~l Jo J 

Since /o is increasing, it is not hard to see that for s > 

s 

fo{t) dt < Fo{ras) + n-^fo{\ras]). 



Here, [ras] signifies the first grid-point bigger than r^s. As before, we can use Lemma [2. II for any 
r G (0,1) there exists a positive continuous function rj with FQ^t) = =^ r]{t) = 0, such that for 

se[o,T] 

Foiras) < Fo{ra)Ms) + Foira)v{ra)- 
Here, r]{ra) — > 0. This means that for < s < r 

r-V2„i/2 /^(i) dt < CMs) + Cr,{r,) + r-^^n-^'' fo{\ras]). 
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Furthermore, when t < 0, fo{t) < 0. This means that (|4.2p imphes 
inf (^r-V2„i/2 g dt - Cs^ < 



Note that the process 



i=l 



Jo .■_ „ 



converges to crWs, in the topology of uniform convergence on compacta, where Ws is twosided 
standard Brownian motion; this is because nr^ = C^a~^ +00. Also, a ^ and r?(ra) — > as 
n ^ 00, so we conclude that 

limsupP f/(0) > a) < P (inf f - — ) < inf W, + C{s - ^^(s)) 
n-,00 V / \s<o \ a J o<s<T 

Since this holds for any r S (0, 1), we have proved the theorem. □ 



4.1 Optimality of the rate 

We wish to prove an analog of Theorem 13.71 for the model with observations on a grid. We need 
an extra condition on the distribution of £«. This makes sense, because suppose that S Z with 
probability 1, then it would be very easy to distinguish /o and fi if /i(0) — /o(0) ^ Z. The condition 
we need is the following: 

(CI) The distribution of e,, with Var(5j) = < +00, has a density (j) with respect to the Lebesgue 
measure, such that there exists M > with 

( (y - a) - (/.^/^ (y) j dy < Ma^ . 



This condition would follow from Hellinger differentiability at of the model a ^ (/>(• — a). 

Theorem 4.2 Suppose Condition (CI) holds. Choose two significance levels a G (0,1) and f3 E 
(0, 1/2). There exist rj > 0, such that for all n large enough, we can find a monotone increasing 
function fi ( close to /o ), and we can find a rate jn with 

limsup max Ff^ (|/(0) - /,(0)| > 7n) < « 

n^oo 1=0,1 \ / 

and 

liminf inf max P/, (\d{Y) - fi{0)\ >V7n)>P, 

n—*oo i=0,l \ J 

where Qi^) is any estimator of f{0) based on the data Y. 
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Proof: We just follow the steps as in the proof of Theorem \'6.7\ so we define a,ra,b and as in 
([iJ]) with C > such that 

2(7 

< a. 



V2ttC 

For fixed n, suppose that a > b. Then we define 

7n = 2a 

and for some fixed < 6 < 1 

, , _ J (5a if t > and fo{t) < 6a, 
^ " I fo{t) otherwise. 

Theorem 14.11 shows that (remembering that a > b) 

IP/o(l/(0)-/o(0)| >7n) <«. 

Since /i > /o, we again have that 

1 



P/, (^/(0)-5a<-2aj <P/„(^/(0)<-aj < 

To bound Ff-^ ^/(O) > (2 + 6)aj , we follow the proof of Theorem 14.11 but with /o replaced with fi. 
Define 

n 

fl{t) = Yl fli^i)'^{x,-i<t<x,} 
i=—n 

and ^ 

Fi{t) = [ fi{s)ds. 
Jo 

Then, in the model using /i, /(O) > 2a precisely when 

Jo J 

(f^V S ^ f^T S \ 

Jo Jo J 

Now, following the steps after Equation (14. 2p . and using the fact that, as in the proof of Theorem 
ITTl for < g < 1. 

Fi{ras) < 2as, 

we conclude that 7„ satisfies the first requirement of the theorem. 

Now we have to note that in the model with measurements on a grid, the data Y € M^"^^. As a 
dominating measure fi we just take the Lebesgue measure, and we get that the density pi of the 
data Y, when we are in the model 

Yj = fi{xj) + Sj, 
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is given by 

n 

j=-n 

Define 



A, 



{<t>'^\y - fiixj)) - 4>^l\y - fo{xj))y dy 
4>"\y - ifiixj) - /o(x,))) - cp'/\y)Y dy. 



Let H'^{po,pi) denote the squared Hellinger distance between po and pi. Then it is a standard 
property of the Helhnger distance that 



n 



H\po,Pi) < Yl (4.3) 

j=-n 

But note that when we define Sa = inf{t > : fo{t) > a} < ra, we get that Aj = whenever 
J < or j > nssa- Furthermore, using Condition (CI), we have that 

Aj < MS'^a^. (4.4) 

This shows that 



\Pi-PAi < 2^ /m{pQ,pi) 

< 2^{ns5a + l)M6'^a? (4.5) 

= acsVm. 



It follows that we can choose 6 > small enough such that ||Pi — Poll i < 2 — 4/3. The rest of the 
proof now follows the proof of Theorem 13.71 □ 



5 The LS-estimator with measurements on random points 

In this section we consider the model 

Yi = fo{X,) + Si, 

where Xi, . . . ,Xn is an iid sample in [—1, 1] with distribution function G, independent of the e^'s. 
We again wish to estimate /o(0), but our LS-estimator is slightly more complicated now. The idea 
is to identify the order statistic X^^^ with i, and calculate the least squares estimator as if the 
measurements were done on the grid 1, . . . , n. So we define for < i < n: 

n 
i=l 
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and ^ 

H{t) = [ h{s) ds. 
Jo 

Furthermore, we define 

F{t) = sup{<^(i) I (f) affine and V < s < n : (t){s) < H{s)} 

and 

, F(t) - Fit - h) 
fit) = lim 

Finally, our estimator of /o is defined by 

f{t) = f{m) with < t < . 

Here, X^^) is the m*^ order statistic of Xi, . . . , X„. In order to control the rate of this estimator, 
we need some control on how the measurement points behave around 0. We assume the following 
condition: 

(C2) The distribution G of Xi has a density g with respect to the Lebesgue measure in a neigh- 
borhood of 0, such that g is continuous in and g{0) > 0. 

As before, our rate is defined by 

Fo{ra) = ara, Fo{-n) = bn and r^'^a = rl'^b = CrT^I'', 
where C > is some fixed constant. 

Theorem 5.1 With the notations as above, suppose Var(ej) = < +oo and suppose that (C2) 
holds. Then 

lim sup P(/(0) > a) < P I inf - ^^^^^^'^^ < inf Ws - ^^^Ml^! _ ^^{s))] 

n-*oo \s<0 a 0<s<l (T / 

and 

limsup P(/(0) < -6) < P (inf Ws - ^^^"^^^^^ < inf W^ - ^^M^ (s _ ^Pl{s))] . 
n-*oo \s<0 a 0<s<l a J 

Proof: We start by bounding P(/(0) > a); the bound for P(/(0) < —b) follows completely analo- 
gously. Define m such that X(^^_i^ < < X(^^y, with probability tending to 1 we can assume that 
1 < m < n (this follows from Condition (C2)). Note that 



{/(0)>a} = {/(m)>«} 



= { inf iH{t)-at)< inf {H{t) - at)} 

0<t<m—l m<t<n 

= { inf {H{t) - H{m-1) -a{t-m)) < inf {H {t) - H {m - 1) - a{t - m))} . 

0<t<m— 1 m<t<n 
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We again use a similar rescaling to the one we used for the grid model, namely t = m + nVaS and 
multiplying left and right with n^^/^ra Then /(O) > a precisely when 



/ pm+nraS ™ ^ 

inf r-V2n-V2 / ^.i^^^^^^^^^ dt + 

^<s<-(ran)-i \ Jm-1 ~[ 

/m+nraS \ 
X] foiX(i))'^{i-i<t<i} dt - (ran)^/^as < 
^-1 i=i J 

inf r-^/2n-i/2 / ^ ea{.-i<t<i} + 

-m)(ran) i \ Jm-l 

/ /o(^«)l{«-i<i<^} - (r„n)^/2^s . (5.1) 



-m(ran) i<s< — (ran) 



0<s<(n— m){ran) 



^-1/2^-1/2 



As before, we have that 

rm+nraS 



converges to aWg, with Ws two-sided standard Brownian motion. Also, for i < m — 1, /o(X(j)) < 0. 
Finally, suppose that Va > rj > for all n > 1. Then /o = on [0,rj], and it becomes very easy 
to bound the right-hand side of (15. ip if we limit s to this interval, which would get us the desired 
result (in this case we would have a parametric rate). Now assume that — > 0; then we need that 

(n — m){ran)~^ +oo. 
This is true with probability 1, since with probability 1 

m/n F{Xi < 0) < 1. 
Furthermore, and most importantly, we need to bound for < s < g{0) 

r-m+nras " m+\nras\ 
j /o(-'^{i))l{i-l<t<j} dt < ^ /o(X(j)). 

Define k = [nr^s] +1 and D = X(^^j^f,y When we condition on D, we know that . . . 

is an iid sample from G restricted to [0, -D]. This implies, using Chebyshev, 



I m+fc-l ^1/2 



^1/2 



E /o(^«)-G([o,D]) 









r foit)dG{t) 


> A 




Jo 







f^'' foitfdGjt) 
G{[0,D])X^ 



< fo{Df\-\ 



It is not hard to see that D — > almost surely when n — > -|-oo, uniformly for s £ [0,g{0)], which 
proves that 

m+\nras] 1/2 i /o „n 



Ta^l^n-^l^ J2 /o(^») = ^([^/ Mt)dG{t) + o,{l). (5.2) 
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Now D is the position of the k + 1-th sample point after 0, and since A; — > oo and k/n ^ 0, it is 
not hard to see, keeping in mind Condition (C2), that 



D 



Therefore, 
and 

1/2 1/2 

So (j5.2p becomes 



5(0) 

G([0,D]) =r,s(l + Op(l)) 
2raS 



rgs 
9(0) 



Ut)dG{t) 



</o 



5(0) 



ff(0)(l+Op(l)) = Op(l). 



m+\nra s] 



^ /o(X(,)) = r-V2„V2^(o)Fo j (1 + o,(l)) + 0,(1). 



Now we use Lemma l2.ll for any r G (0, 1), there exists a continuous increasing function rj on [0, 1] 
with Fo{t) = ^ r]{t) = 0, such that for < s < Tg{0) 



9(0) 



< Ms/9{0))Foira) +vira)Foira). 



So finally we conclude that 



limsupF(/(0) > a) < ¥ { inf {aWs - Cs) < inf aWs - C{s - g{0)'iljr{s/g{0))) 

5<0 0<s<Tg(0) 

;„^^_c^^ ^^_cm^ ^^^^ ) 

s<o a o<s<T a I 



Since this holds for any r G (0, 1), the theorem follows. 



□ 



5.1 Optimality of the rate 

We have an analogue to Theorem 14.21 for this setting as well: 

Theorem 5.2 Suppose Conditions (CI) and (C2) hold. Choose two significance levels a S (0,1) 
and (3 G (0,1/2). There exist r] > 0, such that for all n large enough, we can find a monotone 
increasing function fi (close to fo), and we can find a rate 7„ with 

limsup max Ff^ (|/(0) - /,(0)| > 7n) < « 

and 

liminf inf max Ff- (\e(Y,X) - /i(0)| > r/ • 7„) > /3, 

n^oo 1=0,1 V / 

where 0{Y,X) is any estimator o//(0) based on the data {Y,X). 
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Proof: We can follow the proof of Theorem l4.2l (and of Theorem 13 .Tp . choosing the same alternative 
function /i , also using the steps in the proof of Theorem 15.11 for the alternative /i , right up to the 
point where we need to bound ||Pi — Poll- In the random design case, our data consists of Y and 
X, but when we condition on X, we can use the inequalities (j4.3p and (j4.4p . just by replacing Xj by 
Xj. The only difference is that the number N of Xj''s in the interval [0,S5a] is random. However, 
we have excellent control on N, and by looking at Equation (j4.5p . we can see that the relevant 
bound is given by 

E(\/iV) < Arang{0) for all n big enough. 

Our conclusion is again that we can choose 5 > such that ||-Pi — -Poll i < 2 — 4/3, after which we 
can follow the proof of Theorem 13.71 □ 



6 The Grenander estimator for monotone densities 

In this final section we wish to show that our methods also work for the Grenander estimator of 
a monotone density. Consider a sample Xi , . . . , X„ from a monotone decreasing density /o on 
[— 1, oo). Assume that /o is continuous in 0; we wish to estimate /o(0). Let F„ denote the empirical 
distribution function of the sample Xi , . . . , X„ . Define 

F{t) = inf{0(t) I affine and V s > -1 : (/)(s) > F„(s)}, 

so F is the smallest concave majorant of F^. The Grenander estimator is now defined as 

/-W = l.n.^" + '"-^">. 

^ ^ HO h 

To find the rate of the Grenander estimator, we define 

Fo(.t)= f\hm-fo{s))ds. (6.1) 
Jo 

This is a convex function such that Fq{0) = 0. Since /o is decreasing, instead of increasing, when 
considering the event {/(O) > /o(0) + a}, we have to look to the left, instead of the right. This 
results in reversed rate-equations: define a,b > such that 

Foin) = brb, Fo{-ra) = ava and r^^a = r^^ft = Cn"^/^, 
for some fixed C > 0. Again we define 

no ^o{t) m ^o{t) 

We have the following theorem: 

Theorem 6.1 With the notations as above, we have that i/ra — > and 0, 

limsup P(/(0) > a) < P Im Ws - s < inf - {s - Us))] 

n-.oo \s<0 V/o(0) V/o(0) J 
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and 

limsup P(/(0) < -6) < P ( inf W^, - s < inf - (s - Ms))] ■ 

n^cxD \^<0 v^/o(0) 0<s<l v^/o(0) J 

Proof: As before, we will only show how to bound P(/(0) — /(O) > a) (in fact, this corresponds 
to the inequality for b in the other proofs). Define 

m = f k{s)ds 
Jo 

and introduce the notation F„(0,t] = F„(t) — F„(0), and likewise F{0,t] (which is in fact equal to 
F{t)). Note that 

{/(O) > /o(0) + a} = {_ mf^^ (/o(0)t + at - F„(t)) > inf (/o(0)i + at - F„(t))} 

= { inf (/o(0)t + at-F„(0,t]) >inf(/o(0)t + at-F„(0,t])} 

-l<t<0 t>o 

= {_ mf^j^ {Fo{t) + at + F(0, t] - F„(0, t]) > inf {Fo{t) + at + F(0, t] - F„(0, t])} 

We choose the scaling t = VaS and multiply left and right with n^^'^ra ^^'^ to get that /(O) > /o(0) + a 
precisely when 

inf (nVV-i/2Fo(r,s) + Cs- n'/^r^'/\¥n{0, r^s] - F(0, r^s])) > 

inf fni/V-V2i7|,(r„s) + - ni/V-i/2(F„(0, r,s] - F(0, r,s])) . (6.2) 

Again we use Lemma |2.H but now for the function -0^: for any r G (0, 1), there exists a continuous 
increasing function on [0, 1] with F^it) = =^ r]{t) = 0, such that for — r < s < 

Fo (vas) < M-s)Fo{-ra) + v{ra)Fo{-ra). 

We conclude that /(O) > /o(0) + a implies 

inf (-ni/V-i/2(F„(0, ras] - F(0, r^s]) + C(s + Ms))) + Cr/(r,) > 

inf (Cs - ni/V-i/2(F„(0, r,s] - F(0, r,s])) . 

What remains is to show that if 0, the process 

n 

Yn-.s^ ni/V-i/2(F„(0, r„s] - F(0, r,s]) = n-^/V-^/^ (l{x.e{o,r„.]} - i^(0, r.s]) 

i=l 

converges in distribution, in the topology of uniform convergence on compacta, to fo{0)^^'^Ws, 
where Wg is two-sided standard Brownian motion. It seems that the classical approach to this 
problem is the easiest one: the fact that the finite dimensional marginal distributions converge is 
a relatively straightforward application of the Central Limit Theorem for triangular arrays, since 
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we have written the process as a rescaled sum of independent zero-mean variables; it uses the fact 
that -F'(O) = /o(0). For tightness of the sequence Yn it suffices to show that for all si < s < S2 in 
a compact set, there exists a constant M > such that 

E ((y„(s) - y„(si))' (y„(s2) - Yn{s)f) < M{s2 - Sif. (6.3) 

It is not hard to see that the only relevant terms after taking the expectation are 

n-^r^^E (j^l{x,eiras,,ras]} - F{raSi,ras]j (l{x,e(r„s,r,s2]} - -^(^aS, r^^a]) ^ withi/j, 

of which there are of the order n^. Since /o is bounded in a neighborhood of 0, we can find a 
constant M > such that for n big enough, 

F{raSi,raS2] < Mra{s2 - si). 
This leads to (j6.3p . We can finally conclude that 

hmsupP (/(O) > /o(0) + a) < P f inf (fo{0)'^^Ws + C{s + Ms))) > mf (Cs + fo{0)'/^Ws) 

= pfinffw, ^=s\< inf ^ — is - Ms))\ I ■ 

Since this holds for any r G (0, 1), we have proved the theorem. □ 

When ^ ro > 0, the process Yn{s) does not converge to Brownian motion, but to a rescaled 
Brownian bridge, depending on Fq. However, we would still have that when C — > oo, 

F(/(0) - /o(0) > a) ^ 0, 

so a is still the correct rate (in this case the parametric rate). 

6.1 Optimality of the rate 

In the monotone decreasing density case we also wish to show that the Grenander estimator has 
the by now familiar optimality property. 

Theorem 6.2 Choose two significance levels a G (0, 1) and (3 G (0, 1/2). There exist r] > 0, such 
that for all n large enough, we can find a monotone decreasing density fi on [—1, oo) (close to fo), 
and we can find a rate 7^ with 

limsup max P/, (|/(0) - /i(0)| > 7n) < « 

n^oo 1=0,1 \ / 

and 

liminf inf max Ff^ (\9{Y) - fi{0)\ > rj ■ 7^) > (3, 

n^oo i=0,l \ J 

where Q(Y) is any estimator o//(0) based on the data Y. 
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Proof: The proof is very similar to the previous ones, but we need to be more careful when choosing 
the alternative. Choose n large enough such that the equations 

Foin) = bn, Fo{-ra) = ara and r^^a = r^^^ = Cn"^/^, 
have solutions for some fixed C > with 



2x/M0) . 

< a. 



27rC 

Here, Fq is defined in (j6.ip . Suppose that for this n, a > b. The case a < b can be handled 
analogously. Define for some fixed < 6 < 1 



flit) 



{ /o(0)+(5a ift <Oand/o(t) </o(0) + 5a + ??a, 
h{t) -7]a if t < and /o(t) > /o(0) + 5a + r]a, 
Ut) ift>0. 



Then fi is a monotone decreasing density, if we choose rja such that dt = 1. This is always 

possible for n big enough, unless — > ro > (i.e., unless /o is constant on [— ro,0]). However, in 
this case /(O) estimates /o(0) with a parametric rate (since we consider a > 6), so the conclusions 
of the theorem will follow. From now on we will assume that — > 0. If 6 > a, we only define 
flit) = foit) + % for t < 1 and fo{t) < /o(0) -5a- rja, for t > 1 we would define fi{t) = fo{t). 
Define 

ssa = inf{t > : /o(-t) > /o(0) + 6a}. 
We have seen before that ssa < < ?'a- Also, 

fO 



ifiit)- foit))dt<6assa- 
This gives us an upper bound for r]a- if n is big enough, such that /o(— 1/2) > /o(0) + 6a + r]a, then 



2 



f—SSa f-l/^ 1 

Sassa > J ^ ifoit) - flit)) dt> J ^ ifoit) - flit)) dt = -rja, 
so we conclude that for n big enough 

Va < 26ara. 

Now define 

7n = 2a. 

Then Theorem 16.11 shows that (remember that a > b) 

F/o(|/(0)-/o(0)| >7r 



< a. 



From the way we defined /i, it is clear that we can define X^^^ ~ fi and couple it to X ~ /o, such 
that = X if X > 0, and X < X^^^ < otherwise. So if we consider the empirical distribution 
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functions of two samples of X and X^^\ call them F„ and ¥n \ then ¥n\t) = if f > 0, and 

¥n\t) < F„(t) if t £ [-1,0]. Now note that 

{FHO) < /i(0) - 2a} = {_ mf^^ (/i(0)t - 2at - F«(t)) < inf (/i(0)t - 2at - ¥'^\t))} 

C {_ mf^j, (/i(0)t - 2at - F„(t)) < inf (/i(0)t - 2at - F„(t))} 
= {/(O) - /o(0) < - 2a}. 

So we get, using that 5 < 1, 

P/i (/(O) - /i(0) < -2a) < Ff, (/(O) < -a) < 
Now we focus on P/^ (/(O) > (2 + (5)a) . Define 



2ttC' 



Fiit)= I ifiiO)- his)) ds 
and, with a slight abuse of notation, 

F«(t)= / h{s)ds. 



We can use Equation (j6.2|) for the situation where the underlying function is fi , using the coupled 
sample x[^\ . . . , X^^-*: /(O) > /i(0) + 2a precisely when 

inf (n^/^r-^/^Fi{ras) + 2Cs - n^/^r-^/\F^^\o,ras] - F^^\o,ras])) > 

inf fni/V-i/2^i(ras) + 2C7s-ni/V-i/2(]F(i)(o,ras] -F«(0,r<,s]) 

Note that since F\ is convex, F\{ras) < —Fi{—ra)s for — 1 < s < 0, and that Fi{—ra) < FQ{—ra) + 
Ta^o- < 2ara. Furthermore, for s > 0, Fi[ras) > 0. This means that /(O) > /i(0) + 2a implies that 

inf (-ni/2r-i/2(Fa)(o,r,s] -F«(0,r,s])) > inf (Cs - ni/V-i/2(F«(0, r,s] - F«(0, r,s])) . 

— l<s<0 \ / s>0 \ / 

Since the left-derivative of F*^^) in equals /o(0) + Op(l), we can proceed as in the proof of Theorem 
l6.1l to conclude that our rate 7„ satisfies the first requirement of the theorem. 
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bution of {x[^\ . . . ,Xn^). We do this by bounding the Helhnger distance between /o and /i: 



Now we need to bound ||Pi — PqIIi) where Pq is the distribution of {Xi, . . . , X„) and Pi the distri- 

P ) . We do this by bounding the Helhnger di 







2 



< 



< 



2/o(0) 



For the last inequality we use that for n big enough, < 1/2. In the case where b > a, you could 
use the fact that for n big enough, fo{rb) > /o(0)/2, to get the first inequality (with a different 
constant). It now follows that 



\\Pi - PoWl < 2VH^puPo) < 2VnF2(/o,/i) < C6/^/M^. 

Choose 6 E (0, 1] small enough, such that ||Pi — Po|| < 2 — 4/3, and follow the proof of Theorem 13.71 

□ 
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